The Curse of Optimism

نویسنده

  • Stuart I. Reynolds
چکیده

Stuart I. Reynolds sir s.bham.a .uk S hool of Computer S ien e, The University of Birmingham, Birmingham, B15 2TT, UK 1. The max Operator and Optimisti Initial Biases This paper introdu es the following simple insight: RL algorithms that update their value estimates based upon a return estimate involving maxaQ(s; a) nd it more diÆ ult to overome their initial value biases if these biases are optimisti . To see that this is so, onsider the example in Figure 1. Assume that all transitions yield a reward of 0. A learning algorithm is applied that adjusts Q(s1; a1) towards E[maxaQ(s2; a)℄. If all Q-values are initialised optimisti ally, to 10 for example, then the Q-values of all a tions in s2 must be readjusted (i.e. lowered towards zero) before Q(s1; a1) may be lowered. However, if the Q-values are initialised pessimisti ally by the same amount (to -10), then maxaQ(s2; a) is raised when the value of a single a tion in s2 is raised. In turn, Q(s1; a1) may then also be raised. In general, it is lear that it is easier for RL algorithms employing maxaQ(s; a) in their return estimates to raise their Q-value predi tions than to lower them. In e e t, the max operator auses a resistan e to hange in value updates that an inhibit learning. It is also lear that the e e t of this is further ompounded if i) the Q-values in s2 are themselves based upon the over-optimisti values of their su essors, or, ii) states have many a tions available, and so many Q-values to adjust before maxaQ(s; a) may hange. Examples of methods that use maxaQ(s; a) in their return estimates and are a e ted by this phenomenon are: value-iteration (Sutton & Barto, 1998), Qlearning (Watkins, 1989), R-learning (S hwartz, 1993), Watkins' Q( ) (Watkins, 1989; Sutton & Barto, 1998), Peng and Williams' Q( ) (Peng &Williams, 1996) and their many derivatives. Experimental results with value-iteration and Qlearning have shown that, when the e e ts of optimism on the agent's exploration strategy are a ounted for, onvergen e towards the optimal Q-fun tion generQ(s , a ) = 10 1

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

گزارش یک موردOndines curse

ABSTRACT Pulmonary hypertension is one of the causes of cyanosis in children. Pulmonary hypertention may be primary or secondary. Etiology of secondery PH are variable and one of them is pulmonary ventilation abnormality. One of the rare causes of secondary PH is congenital central hypoventilation syndrome or ondines curse. The patient was 6 years old boy presented with easy fati...

متن کامل

Sleep Apnea Syndrome after Posterior Fossa Surgery: A Case of Acquired Ondine's Curse

Introduction: Ondine’s Curse is a catastrophic but rare condition in adults. It is referred to as a congenital or acquired condition, in which the patient cannot breathe automatically while asleep. Acquired causes of this disease can be any cause affecting the ventrolateral part of the medulla, which is considered to be the breathing center in humans.    Case Report:   A 51-year-old woman, with...

متن کامل

Income Inequality and the Oil Curse: The Case of Oil-Rich Developing Countries

While most literature on natural resource curse highlight its effect on the growth rate and the level of income, this paper shifts the focus toward the effect of oil dependence on the distribution of income in oil-rich developing countries (includiong Iran and 18 other countries). Moreover, the paper studies the impact of institutional quality and the interaction effect of different institution...

متن کامل

Institutional Quality and Curse Resources: An Experimental Study on OPEC Countries

This paper is to study the resource curse applying annual data from 2002 to 2016 for the Organization of the Petroleum Exporting Countries (OPEC) members i.e. Algeria, Iran, Kuwait, Nigeria, Qatar, Saudi Arabia, United Arab Emirates and Venezuela. For this purpose, there were concerned the interactions role of resource abundance and institution quality, and their marginal effect of the countrie...

متن کامل

Manager Optimism Based on Environmental Uncertainty and Accounting Conservatism

It is expected that more accounting conservation (environmental uncertainty) reduces manager optimism. Prior research, however, has struggled to establish this relation empirically. Moreover, some evidence points to the possibility that the manager optimism is lower for firms with more accounting conservation. In this paper, the author examine the link between accounting conservation, environme...

متن کامل

The Curse of Wealth – Middle Eastern Countries Need to Address the Rapidly Rising Burden of Diabetes

The energy boom of the last decade has led to rapidly increasing wealth in the Middle East, particularly in the oil and gas-rich Gulf Cooperation Council (GCC) countries. This exceptional growth in prosperity has brought with it rapid changes in lifestyles that have resulted in a significant rise in chronic disease. In particular the number of people diagnosed with diabetes has increased dramat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007